Hypothesis Testing
The Null
The Null usually is “The World is as it is and my intervention had no effect.” It’s usually the most conservative position you can imagine. Your exertions had No Effect. Here’s some typical Nulls:
| Test type | Typical | Typical |
|---|---|---|
| Mean difference | ||
| Risk ratio / odds ratio | RR = 1 | RR ≠ 1 |
| Hazard ratio (Cox) | HR = 1 | HR ≠ 1 |
| Correlation | ||
| Regression coefficient |
The Alternative Hypothesis is usually “Yeah my shit might have done something to The World” as measured by some estimate. You’re trying to see if you can falsify the Null (thanks Popper et al.)
You never accept the Alternative/Research Hypothesis ! Falsifiability FTW! You either reject or fail to reject the Null Hypothesis .
A Bounded Null
Do you ever set … ?
Nope. Think about what you’d say if you rejected the null in . “The mean is exactly 5.” That’s weird statistically and kinda philosophically.
And you’re testing the Null by calculating a sampling distribution under and asking “How surprising is my data?” You answer this question by computing all manner of Test Statistics (e.g. ) and then computing the p-Value.
So which value from are you going to plug in that’s not ? Yep. Each one will give you a different distribution. Just don’t do it.
p-Value
If you want to make a falsifiable claim (thanks Popper) about The World, a p-value is as easy as this:
What is the probability of seeing what I saw in my experiment if the null hypothesis is true?1
78%? Well that sounds bad. You fail to reject the null. 5%? That’s small. Maybe something’s going on? 0.1%? Okay maybe something’s really going on. “Something” here means association, not causation.
Confidence Intervals
You’ve seen them. “RR = 1.5 95% CI [1.3,1.6]”. What do they mean? Do they mean that you’re 95% sure the true value is somewhere in there?
Nope! Common mistake2. You’re saying that if you repeated your experiment several times, your value would ‘wiggle’ each time (different sample, other rando effects) but 95% of the time will be in the interval. That’s all.
Crossing the Null
”Which Test?” TLDR
To pick a test, and generally speaking, you’ll be asking
- What is the nature of my Data3? Continuous? Categorical?
- How many groups am I dealing with? One, two, or more than two?
Here’s a nice little table from this excellent video (by a Columbia alum!)
| 1 Group | 2 Groups | 2+ Groups | |
|---|---|---|---|
| Categorical Data | Proportion Test (-test approx.) Test | Proportion Test (-test approx.) Test | Test |
| Continuous Data | -test & Variants -test & Variants | -test & Variants -test & Variants | ANOVA (-test, 1-way, 2-way) |
| Classic Assumptions Violated4 | Sign Test Signed Rank Test | Wilcoxon–Mann–Whitney Test Paired -test McNemar’s Test | Kruskal–Wallis Test |